Machine Learning


MGMT 675
AI-Assisted Financial Analysis
Kerry Back

Machine learning in finance

  • Fraud detection
  • Credit risk analysis
  • Return prediction
  • Valuation
  • Text analysis
  • Time series forecasting

Models

  • Linear
  • Trees
  • Neural networks
  • Others

Regression vs Classification

  • Regression means to predict a continuous variable (not necessarily linear regression).
  • Classification is to predict a categorical variable. Binary or multiclass.

Train and Test

  • Training means fitting a model (like linear regression).
  • Objective is to make accurate predictions on new data.
  • To assess performance, we have to check the model on “new data” (data not used in training).
  • Split data into random train and test subsets. Train on training data. Test on test data.

Test criteria

  • How do we decide if performance is good or bad?
  • For continuous variables,
    • usually want to achieve a low sum of squared errors
    • equivalently, achieve a high \(R^2\).

\[R^2 = 1 - \frac{\sum (y_i - \hat y_i)^2}{\sum (y_i - \bar{y})^2}\]

  • Categorical also based on prediction errors

Example data

  • Download ml1.xlsx from the course website
  • Upload it to Julius and ask Julius to read it and describe it.
  • The data was created by generating 51 sets of 100 standard normals.
    • The first 50 sets are labeled x1, …, x50.
    • The last set was used as the noise to generate y1 as x1 + noise.
    • So, x2, …, x50 are irrelevant for y1.

Linear regression example

  • Ask Julius to do a train-test split of the data with 20% of the data in the test set.
  • Ask Julius to train a linear regression on the training data with x1, …, x50 as the features and y1 as the target.
  • Ask Julius to compute the R-squared on the test data.
  • Ask Julius to report the parameter estimates.

TBD